SUPPORT / SAMPLES & SAS NOTES
 

Support

Problem Note 58847: Reading SAS® Scalable Performance Data (SPD) Engine data with the serializer/deserializer (serde) from Hadoop Hive using Map-Reduce generates errors

DetailsHotfixAboutRate It

The SAS Scalable Performance Data (SPD) Engine creates SAS® data sets on a Hadoop Distributed File System (HDFS). The serializer/deserializer (serde) provided by SAS enables read access to these tables directly from Hive.

SAS supplies an InputFormat with the serde. The InputFormat has custom methods to ensure that the data is split on even multiples of the SPD record length. This ensures that each split occurs on a record boundary and contains only complete records.

When the Hive query does not require a Map-Reduce job to process the query, it uses the SAS InputFormat.

However, when Hive starts a Map-Reduce job to process a query, Map-Reduce uses its own InputFormat rather than the InputFormat provided by SAS. This causes the splits that come into the Map-Reduce job to potentially not be on a record boundary, but instead might start and/or end in the middle of a record. Error messages like the following are generated when this occurs:

NOTE: Invoking SASEP HCat Input MapReduce
ERROR: ERROR: java.lang.RuntimeException: ERROR: The table record length is not found in table properties.
ERROR: ERROR:  at com.sas.hadoop.serde.spde.hive.SPDInputFormat.setObsLength(SPDInputFormat.java:89)
ERROR: ERROR:  at com.sas.hadoop.serde.spde.hive.SPDInputFormat.getSplits(SPDInputFormat.java:56)
ERROR: ERROR:  at org.apache.hive.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:162)
ERROR: ERROR:  at com.sas.access.hadoop.ca.input.hcat.HCatWrapperInputFormat.getSplits(HCatWrapperInputFormat.java:66)
ERROR: ERROR:  at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
ERROR: ERROR:  at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
ERROR: ERROR:  at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
ERROR: ERROR:  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1306)
ERROR: ERROR:  at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1303)
ERROR: ERROR:  at java.security.AccessController.doPrivileged(Native Method)
ERROR: ERROR:  at javax.security.auth.Subject.doAs(Subject.java:415)
ERROR: ERROR:  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
ERROR: ERROR:  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1303)
ERROR: ERROR:  at com.dataflux.hadoop.DFHadoopMapReduce$1.run(DFHadoopMapReduce.java:425)
ERROR: ERROR:  at java.security.AccessController.doPrivileged(Native Method)
ERROR: ERROR:  at javax.security.auth.Subject.doAs(Subject.java:415)
ERROR: ERROR:  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
ERROR: ERROR:  at com.dataflux.hadoop.DFHadoopMapReduce.runMapReduce(DFHadoopMapReduce.java:311)
ERROR: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING
ERROR:  at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:294)
ERROR:  at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:619)
ERROR:  at com.dataflux.hadoop.DFHadoopMapReduce.waitForCompletion(DFHadoopMapReduce.java:490)
ERROR: Failed to run DS2INDB.

Click the Hot Fix tab in this note to access the hot fix for this issue.



Operating System and Release Information

Product FamilyProductSystemProduct ReleaseSAS Release
ReportedFixed*ReportedFixed*
SAS SystemBase SASz/OS9.4_M39.4_M49.4 TS1M39.4 TS1M4
z/OS 64-bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft® Windows® for x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 8 Enterprise 32-bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 8 Enterprise x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 8 Pro 32-bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 8 Pro x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 8.1 Enterprise 32-bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 8.1 Enterprise x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 8.1 Pro 32-bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 8.1 Pro x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows 109.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows Server 20089.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows Server 2008 R29.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows Server 2008 for x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows Server 2012 Datacenter9.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows Server 2012 R2 Datacenter9.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows Server 2012 R2 Std9.4_M39.4_M49.4 TS1M39.4 TS1M4
Microsoft Windows Server 2012 Std9.4_M39.4_M49.4 TS1M39.4 TS1M4
Windows 7 Enterprise 32 bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Windows 7 Enterprise x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Windows 7 Home Premium 32 bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Windows 7 Home Premium x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Windows 7 Professional 32 bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Windows 7 Professional x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Windows 7 Ultimate 32 bit9.4_M39.4_M49.4 TS1M39.4 TS1M4
Windows 7 Ultimate x649.4_M39.4_M49.4 TS1M39.4 TS1M4
64-bit Enabled AIX9.4_M39.4_M49.4 TS1M39.4 TS1M4
64-bit Enabled Solaris9.4_M39.4_M49.4 TS1M39.4 TS1M4
HP-UX IPF9.4_M39.4_M49.4 TS1M39.4 TS1M4
Linux for x649.4_M39.4_M49.4 TS1M39.4 TS1M4
Solaris for x649.4_M39.4_M49.4 TS1M39.4 TS1M4
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.